Inner Product Similarity Search using Compositional Codes
نویسندگان
چکیده
This paper addresses the nearest neighbor search problem under inner product similarity and introduces a compact code-based approach. The idea is to approximate a vector using the composition of several elements selected from a source dictionary and to represent this vector by a short code composed of the indices of the selected elements. The inner product between a query vector and a database vector is efficiently estimated from the query vector and the short code of the database vector. We show the superior performance of the proposed group M -selection algorithm that selects M elements from M source dictionaries for vector approximation in terms of search accuracy and efficiency for compact codes of the same length via theoretical and empirical analysis. Experimental results on large-scale datasets (1M and 1B SIFT features, 1M linear models and Netflix) demonstrate the superiority of the proposed approach.
منابع مشابه
Compositional Correlation Quantization for Large-Scale Multimodal Search
Efficient similarity retrieval from large-scale multimodal database is pervasive in current search systems with the big data tidal wave. To support queries across content modalities, the system should enable cross-modal correlation and computation-efficient indexing. While hashing methods have shown great potential in approaching this goal, current attempts generally failed to learn isomorphic ...
متن کاملSharing Hash Codes for Multiple Purposes
Locality sensitive hashing (LSH) is a powerful tool for sublinear-time approximate nearest neighbor search, and a variety of hashing schemes have been proposed for different similarity measures. However, hash codes significantly depend on the similarity, which prohibits users from adjusting the similarity at query time. In this paper, we propose multiple purpose LSH (mp-LSH) which shares the ha...
متن کاملAn Efficient and Secure Multi Keyword Search in Encrypted Cloud Data with Ranking
As cloud computing become more flexible & effective in terms of economy, data owners are motivated to outsource their complex data systems from local sites to commercial public cloud. But for security of data, sensitive data has to be encrypted before outsourcing, which overcomes method of traditional data utilization based on plaintext keyword search. Considering the large number of data users...
متن کاملSimilarity preserving compressions of high dimensional sparse data
The rise of internet has resulted in an explosion of data consisting of millions of articles, images, songs, and videos. Most of this data is high dimensional and sparse. The need to perform an efficient search for similar objects in such high dimensional big datasets is becoming increasingly common. Even with the rapid growth in computing power, the bruteforce search for such a task is impract...
متن کاملAsymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS)
We present the first provably sublinear time hashing algorithm for approximate Maximum Inner Product Search (MIPS). Searching with (un-normalized) inner product as the underlying similarity measure is a known difficult problem and finding hashing schemes for MIPS was considered hard. While the existing Locality Sensitive Hashing (LSH) framework is insufficient for solving MIPS, in this paper we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1406.4966 شماره
صفحات -
تاریخ انتشار 2014